AITopics | Puno Department

Collaborating Authors

Puno Department

Quechua Speech Datasets in Common Voice: The Case of Puno Quechua

Huaman, Elwin, Huaman, Wendi, Huaman, Jorge Luis, Quispe, Ninfa

arXiv.org Artificial IntelligenceOct-17-2025

Under-resourced languages, such as Quechuas, face data and resource scarcity, hindering their development in speech technology. To address this issue, Common Voice presents a crucial opportunity to foster an open and community-driven speech dataset creation. This paper examines the integration of Quechua languages into Common Voice. We detail the current 17 Quechua languages, presenting Puno Quechua (ISO 639-3: qxp) as a focused case study that includes language onboarding and corpus collection of both reading and spontaneous speech data. Our results demonstrate that Common Voice now hosts 191.1 hours of Quechua speech (86\% validated), with Puno Quechua contributing 12 hours (77\% validated), highlighting the Common Voice's potential. We further propose a research agenda addressing technical challenges, alongside ethical considerations for community engagement and indigenous data sovereignty. Our work contributes towards inclusive voice technology and digital empowerment of under-resourced language communities.

artificial intelligence, quechua, speech recognition, (13 more...)

arXiv.org Artificial Intelligence

2510.13871

Country: South America > Peru > Puno Department > Puno Province > Puno (0.91)

Genre: Research Report > New Finding (0.54)

Industry: Government (0.68)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)

Add feedback

Optimization of Energy Consumption Forecasting in Puno using Parallel Computing and ARIMA Models: An Innovative Approach to Big Data Processing

Vilca-Tinta, Cliver W., Torres-Cruz, Fred, Quispe-Morales, Josefh J.

arXiv.org Machine LearningJul-27-2024

This research presents an innovative use of parallel computing with the ARIMA (AutoRegressive Integrated Moving Average) model to forecast energy consumption in Peru's Puno region. The study conducts a thorough and multifaceted analysis, focusing on the execution speed, prediction accuracy, and scalability of both sequential and parallel implementations. A significant emphasis is placed on efficiently managing large datasets. The findings demonstrate notable improvements in computational efficiency and data processing capabilities through the parallel approach, all while maintaining the accuracy and integrity of predictions. This new method provides a versatile and reliable solution for real-time predictive analysis and enhances energy resource management, which is particularly crucial for developing areas. In addition to highlighting the technical advantages of parallel computing in this field, the study explores its practical impacts on energy planning and sustainable development in regions like Puno.

consumption, energy consumption, implementation, (16 more...)

arXiv.org Machine Learning

2408.00014

Country:

South America > Peru > Puno Department > Puno Province > Puno (0.86)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.64)
South America > Argentina (0.04)
(13 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.40)
Overview > Innovation (0.40)

Industry:

Information Technology (1.00)
Energy > Power Industry (1.00)
Energy > Renewable > Hydroelectric (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.51)

Add feedback

Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem

Court, Sara, Elsner, Micha

arXiv.org Artificial IntelligenceJun-21-2024

This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline. We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of information retrieved from a constrained database of digitized pedagogical materials (dictionaries and grammar lessons) and parallel corpora. Using both automatic and human evaluation of model output, we conduct ablation studies that manipulate (1) context type (morpheme translations, grammar descriptions, and corpus examples), (2) retrieval methods (automated vs. manual), and (3) model type. Our results suggest that even relatively small LLMs are capable of utilizing prompt context for zero-shot low-resource translation when provided a minimally sufficient amount of relevant linguistic information. However, the variable effects of prompt type, retrieval method, model type, and language-specific factors highlight the limitations of using even the best LLMs as translation systems for the majority of the world's 7,000+ languages and their speakers.

computational linguistic, information, translation, (15 more...)

arXiv.org Artificial Intelligence

2406.15625

Country:

North America > Canada > Ontario > Toronto (0.05)
North America > United States > Ohio (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information

Taguchi, Chihiro, Saransig, Jefferson, Velásquez, Dayana, Chiang, David

arXiv.org Artificial IntelligenceApr-23-2024

This paper presents Killkan, the first dataset for automatic speech recognition (ASR) in the Kichwa language, an indigenous language of Ecuador. Kichwa is an extremely low-resource endangered language, and there have been no resources before Killkan for Kichwa to be incorporated in applications of natural language processing. The dataset contains approximately 4 hours of audio with transcription, translation into Spanish, and morphosyntactic annotation in the format of Universal Dependencies. The audio data was retrieved from a publicly available radio program in Kichwa. This paper also provides corpus-linguistic analyses of the dataset with a special focus on the agglutinative morphology of Kichwa and frequent code-switching with Spanish. The experiments show that the dataset makes it possible to develop the first ASR system for Kichwa with reliable quality despite its small dataset size. This dataset, the ASR model, and the code used to develop them will be publicly available.

dataset, kichwa, transcription, (15 more...)

arXiv.org Artificial Intelligence

2404.15501

Country:

South America > Bolivia (0.04)
South America > Peru > Puno Department > Puno Province > Puno (0.04)
South America > Ecuador > Pichincha Province > Quito (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

ELSA -- Enhanced latent spaces for improved collider simulations

Nachman, Benjamin, Winterhalder, Ramon

arXiv.org Machine LearningOct-21-2023

Simulations play a key role for inference in collider physics. We explore various approaches for enhancing the precision of simulations using machine learning, including interventions at the end of the simulation chain (reweighting), at the beginning of the simulation chain (pre-processing), and connections between the end and beginning (latent space refinement). To clearly illustrate our approaches, we use W+jets matrix element surrogate simulations based on normalizing flows as a prototypical example. First, weights in the data space are derived using machine learning classifiers. Then, we pull back the data-space weights to the latent space to produce unweighted examples and employ the Latent Space Refinement (LASER) protocol using Hamiltonian Monte Carlo. An alternative approach is an augmented normalizing flow, which allows for different dimensions in the latent and target spaces. These methods are studied for various pre-processing strategies, including a new and general method for massive particles at hadron colliders that is a tweak on the widely-used RAMBO-on-diet mapping. We find that modified simulations can achieve sub-percent precision across a wide range of phase space.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1140/epjc/s10052-023-11989-8

2305.07696

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New York (0.04)
South America > Peru > Puno Department (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluating Self-Supervised Speech Representations for Indigenous American Languages

Chen, Chih-Chen, Chen, William, Zevallos, Rodolfo, Ortega, John E.

arXiv.org Artificial IntelligenceOct-8-2023

The application of self-supervision to speech representation learning has garnered significant interest in recent years, due to its scalability to large amounts of unlabeled data. However, much progress, both in terms of pre-training and downstream evaluation, has remained concentrated in monolingual models that only consider English. Few models consider other languages, and even fewer consider indigenous ones. In our submission to the New Language Track of the ASRU 2023 ML-SUPERB Challenge, we present an ASR corpus for Quechua, an indigenous South American Language. We benchmark the efficacy of large SSL models on Quechua, along with 6 other indigenous languages such as Guarani and Bribri, on low-resource ASR. Our results show surprisingly strong performance by state-of-the-art SSL models, showing the potential generalizability of large-scale models to real-world data.

computational linguistic, indigenous language, quechua, (13 more...)

arXiv.org Artificial Intelligence

2310.03639

Country:

South America > Brazil (0.05)
North America > Canada > Ontario > Toronto (0.05)
South America > Bolivia (0.05)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.49)

Add feedback

An Integrated NPL Approach to Sentiment Analysis in Satisfaction Surveys

Pinto-Luque, Edson B.

arXiv.org Artificial IntelligenceAug-1-2023

The research project aims to apply an integrated approach to natural language processing NLP to satisfaction surveys. It will focus on understanding and extracting relevant information from survey responses, analyzing feelings, and identifying recurring word patterns. NLP techniques will be used to determine emotional polarity, classify responses into positive, negative, or neutral categories, and use opinion mining to highlight participants opinions. This approach will help identify the most relevant aspects for participants and understand their opinions in relation to those specific aspects. A key component of the research project will be the analysis of word patterns in satisfaction survey responses using NPL. This analysis will provide a deeper understanding of feelings, opinions, and themes and trends present in respondents responses. The results obtained from this approach can be used to identify areas for improvement, understand respondents preferences, and make strategic decisions based on analysis to improve respondent satisfaction.

application, sentiment analysis, workshop course, (12 more...)

arXiv.org Artificial Intelligence

2307.11771

Country: South America > Peru > Puno Department > Puno Province > Puno (0.05)

Genre: Questionnaire & Opinion Survey (1.00)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.94)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Comparative Analysis of Libraries for the Sentimental Analysis

Ccoya, Wendy, Pinto, Edson

arXiv.org Artificial IntelligenceJul-26-2023

This study is main goal is to provide a comparative comparison of libraries using machine learning methods. Experts in natural language processing (NLP) are becoming more and more interested in sentiment analysis (SA) of text changes. The objective of employing NLP text analysis techniques is to recognize and categorize feelings related to twitter users utterances. In this examination, issues with SA and the libraries utilized are also looked at. provides a number of cooperative methods to classify emotional polarity. The Naive Bayes Classifier, Decision Tree Classifier, Maxent Classifier, Sklearn Classifier, Sklearn Classifier MultinomialNB, and other conjoint learning algorithms, according to recent research, are very effective. In the project will use Five Python and R libraries NLTK, TextBlob, Vader, Transformers (GPT and BERT pretrained), and Tidytext will be used in the study to apply sentiment analysis techniques. Four machine learning models Tree of Decisions (DT), Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbor (KNN) will also be used. To evaluate how well libraries for SA operate in the social network environment, comparative study was also carried out. The measures to assess the best algorithms in this experiment, which used a single data set for each method, were precision, recall, and F1 score. We conclude that the BERT transformer method with an Accuracy: 0.973 is recommended for sentiment analysis.

machine learning, natural language, sentiment analysis, (15 more...)

arXiv.org Artificial Intelligence

2307.14311

Country:

South America > Peru > Puno Department > Puno Province > Puno (0.05)
Asia > India (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

QICHWABASE: A Quechua Language and Knowledge Base for Quechua Communities

Huaman, Elwin, Lindemann, David, Caruso, Valeria, Huaman, Jorge Luis

arXiv.org Artificial IntelligenceApr-29-2023

Over the last decade, the Web has increasingly become a space of language and knowledge representation. However, it is only true for well-spread languages and well-established communities, while minority communities and their resources received less attention. In this paper, we propose QICHWABASE to support the harmonization process of the Quechua language and knowledge, and its community. For doing it, we adopt methods and tools that could become a game changer in favour of Quechua communities around the world. We conclude that the methodology and tools adopted on building QICHWABASE, which is a Wikibase instance, could enhance the presence of minorities on the Web.

artificial intelligence, expert system, qichwabase, (10 more...)

arXiv.org Artificial Intelligence

2305.06173

Country:

South America > Peru > Puno Department > Puno Province > Puno (0.05)
Europe > Spain > Basque Country (0.05)
Europe > Italy (0.05)
(3 more...)

Genre: Research Report (0.72)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.47)

Add feedback

Machine Learning Applied to Peruvian Vegetables Imports

Ticona-Salluca, Hugo, Torres-Cruz, Fred, Tumi-Figueroa, Ernesto Nayer

arXiv.org Artificial IntelligenceJan-8-2023

The current research work is being developed as a training and evaluation object. the performance of a predictive model to apply it to the imports of vegetable products into Peru using artificial intelligence algorithms, specifying for this study the Machine Learning models: LSTM and PROPHET. The forecast is made with data from the monthly record of imports of vegetable products(in kilograms) from Peru, collected from the years 2021 to 2022. As part of applying the training methodology for automatic learning algorithms, the exploration and construction of an appropriate dataset according to the parameters of a Time Series. Subsequently, the model with better performance will be selected, evaluating the precision of the predicted values so that they account for sufficient reliability to consider it a useful resource in the forecast of imports in Peru.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2301.03587

Country:

South America > Peru > Puno Department > Puno Province > Puno (0.06)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)

Genre: Research Report (0.40)

Industry: Energy (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback